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Abstract 


A  very  important  problem  in  numerical  optimization  is  to  find  a  way 
to  update  a  sparse  Hessian  approximation  so  that  it  will  be  positive 
definite  under  reasonable  circumstances.  This  problem  has  motivated 
research,  which  is  yet  to  show  much  progress,  toward  a  "sparse  BFGS 
method."  In  this  paper,  we  suggest  a  different  approach  to  the  problem 
based  on  using  a  sparse  Broyden,  or  Schubert,  update  directly  on  the 
Cholesky  factor  of  the  current  Hessian  approximation  to  define  the  next 
Hessian  approximation  implicitly  in  terms  of  its  Cholesky  factorization. 
This  approach  has  the  added  advantage  of  being  able  to  cheaply  find  the 
Newton  step,  since  no  factorization  step  is  required.  The  difficulty 
with  our  approach  is  in  finding  a  satisfactory  secant  or  quasi"Newton 
condition  to  use  in  the  update. 


Kas  wprd.£ 


quasi-Newton  methods,  continuous  minimization,  sparse  Hessians. 
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1.  Introduction 


Let  and  consider  the  problem  of  finding  a  local  minimizer 
of  f.  Often,  a  solution  to  this  problem  can  be  obtained  using  a  quasi- 
Newton  method  which  is  basically  an  iterative  procedure  of  the  form 

k=0.1.2,... 

where  Vf(Xj^)  is  the  gradient  of  f  at  and  is  some  approximation  to 
the  Hessian  of  f  at  x^.  The  sequence  of  matrices  is  often  gen¬ 
erated  by  the  least-change  secant  update  (l.c.s.u.)  approach,  i.e.  in 
going  from  to  want  to  change  B^  as  little  as  possible  in 
some  sense  while  preserving  its  structure,  e.g.  symmetry,  sparsity, 
positive  definiteness,  and  forcing  to  satisfy  the  secant  equation 


®k+l\  " 


\  =  Vl 


■  X,  ,  y  = V  f ,  ,  -  V f ,  .  For  further  details  on  the  l.c.s.u. 

k  ■'k  k+1  k 


idea  the  reader  is  referred  to  Dennis  and  Schnabel  (1979). 


For  small  dense  problems  the  BFGS  update  developed  independently  by 
Broyden  (1970),  Fletcher  (1970),  Goldfarb  (1970),  and  Shanno  (1970), 


T 

.  -  .  ^k^k 

\+l  T 

^k^k 


°k‘k°k\ 


has  been  found  to  be  the  best  among  the  class  of  l.c.s.u.  that  preserve 
symmetry  and  positive  definiteness.  However,  for  larger  problems  with 
sparse  Hessians,  the  situation  is  less  clear.  Recently,  a  considerable 
amount  of  work  has  gone  into  extending  the  known  dense  updates  to 
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preserve  sparsity  using  the  least-change  secant  update  framework  (Marwil 
1978;  Toint  1977,1978,1979,1981;  Shanno  1981;  Dennis  and  Schnabel  1979). 
The  resulting  updates  preserve  symmetry,  sparsity,  and  satisfy  the 
secant  equation.  However,  they  do  not  necessarily  maintain  positive 
definiteness,  and  they  exhibit  rather  unsatisfactory  performance  in 
practice.  An  interesting  different  approach  is  developed  in  Griewank 
and  Toint  (1982a, b,c,  1983).  Thus,  a  major  problem  in  this  area  is  how 
to  generate  sparse  symmetric  positive-definite  secant  updates  that  per¬ 
form  well  in  practice. 

In  this  report,  we  will  point  out  an  approach  to  derive  a  family  of 
sparse  symmetric  positive-definite  secant  updates.  The  motivation  for 
this  work  is  based  on  the  following  derivation  of  the  EFGS  update  by 
Dennis  and  Schnabel  (1981): 

Assuming  B  = L  L  is  the  current  symmetric  positive-definite  approxi- 

T 

mation  to  the  Hessian  of  f,  obtain  as  follows: 

(1)  For  arbitrary  v e  R°,  solve  for 

=  argmin  1 iJ-L  1 L  s.t.  Jv=y. 

+  •  C  r 

4 

T 

(2)  Solve  for  v  so  that  J_^s-v. 

The  solution  is  the  Broyden  update  (Broyden  1965)  of  sending 

.1  T  X 

y  -  — Ll  s  to  y.  Dennis  and  Schnabel  (1981)  prove  that  exactly 

iL^sl  ^ 

c 

the  BFGS  update  of  B  =  L  From  B .  =  J  jJ,  we  can  get  the  Cholesky 

*  C  C  C  T  T  T 

factors  of  B_^  in  0(n  )  operations  by  forming  the  LQ  factorization  of 
without  ever  forming  (Goldfarb  1976),  since 
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T  T  T  T 

B,  =  vi  =  wX=v.- 

Note  that  what  we  really  want  all  along  is  the  Cholesky  factoriza¬ 
tion  of  B  ,  not  B  itself.  Thus  it  seems  reasonable  to  update  L  by  a 
+  +  ^  ■ 

sparse  Broyden  or  Schubert  formula  (Broyden  1971*  Dennis  and  Schnabel 
1979,  Marwil  1979,  Schubert  1970)  to  get  direc.Uy.  using  the  l.c.s.u. 
idea  as  in  Dennis  and  Marwil  (1982).  This  way  we  can  ensure  that  the 
resulting  B  =L  L  is  positive  definite  and  symmetric,  and  we  can  also 
preserve  the  sparsity  structure  of  L^,  if  there  is  any.  In  Section  2,  we 
will  consider  some  instances  of  this  updating  scheme.  Section  3  outlines 
our  test  algorithm  for  the  minimization  problem  together  with  the  new 
update.  In  Section  4,  we  will  discuss  some  of  the  computational  results. 
Finally,  Section  5  is  a  brief  look  at  some  possible  future  work. 

2.  Ulfi  Hpjk.Lfi  HeXhoi 

Let  Q(zj^,Z2)  denote  { MeE.°  )Az^~z^  for  z^^,  Z2  e  R  }•  Lsb  L 
denote  the  space  of  lower  triangular  matrices  with  some  fixed  sparsity 
pattern  chosen  from  the  sparsity  of  the  Hessian  as  in  George  and  Liu 
(1981). 

An  interesting  approach  to  the  problem  of  finding  a  sparse  sym¬ 
metric  positive-definite  update  would  be  to  solve  the  following  problem: 

n  T  ■ 

Problem  X*  Given  y,s  e  R  such  that  y  s  >0 ,  find 

T 

L  =argmin||L-L  ||_  s.t.  LeL  and  LL  eQ(y,s). 

+  C  r 

At  the  moment,  a  computationally  viable  solution  to  the  problem  is  not 
obvious  (see  Greenstadt  (1983).  Hence  we  seek  a  related  but  "easier 
problem  by  sparsifying  the  BFGS  derivation  of  Section  1.  The  following 
notation  will  be  very  useful: 
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P  (•)  represents  the  orthogonal  projection  of  (*)  onto  X  in  the 

A 

Frobenius  norm; 


.th 


denotes  the  vector  z  with  the  sparsity  of  the  j  row  of  L  ; 


(z-)-  is  the  i—  component  of  z. 

J  1  2 


.  th 


is  the  vector  z  with  the  sparsity  of  the  j  column  of  L  ; 


(z^z^)"*"  is  the  pseudo- inverse  of  z^z^ ; 


and 


e  .  is  the  j —  unit  vector. 

If  we  try  to  modify  the  BFGS  derivation  in  the  most  straightforward 
way*  then  we  run  into  apparent  difficulty  as  follows: 

(1)  For  arbitrary  veE.^* 

=  argmin  I  |L  -  I  Ip  s.t.  LeLnQ(y»v) 
is  solved  by  the  appropriate  sparse  Broyden  update 


L_^  =  L^+  >  (v^v.)%^(y- L^v)e^v^, 
i=l 


(if  (1)  has  a  solution  --  more  on  this  later.) 

(2)  Solve  for  v  such  that  v=L^s.  But  this  yields  the  rather  formid¬ 
able  system 


T  ^  ip  +  T 
v=L  s+  >  (v.v.)  e.(y-Lv)s.v. 

C  .,11  1  C  11 

1=1 

to  solve  for  v. 

Instead  of  trying  to  solve  for  v»  we  will  turn  the  problem  around 

T 

and  try  to  incorporate  v  =  L_^s  into  the  variational  problem  whose 
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solution  defines 
Problem  2« 


L_^.  This  leads  to  the  following  problem: 

T  T 

Given  v  0,  y,  s  £  such  that  v  v  =  y  s,  find 


where  Q(v, 


T 

L_^  =  argmin  1|L-L^llp  s.t.  LeQ(y»v)nLnQ(v»s)  • 
s)^  is  the  set  of  transposes  of  matrices  in  Q(v,s). 


The  remainder  of  this  section  is  devoted  to  the  presentation  of 

T  T  .  , 

results  on  the  solution  to  Problem  2.  In  particular,  v  v  =  y  s  is  shown 
to  be  neccessary  for  feasibility,  although  it  is  quite  unlikely  to  be 
sufficient.  In  the  event  that  Problem  2  is  not  feasible,  i.e. 
Q(y,v)  n  L  n  Q(v,s)^=^i,  we  provide  a  generalized  inverse  type  solution. 

Theorem  2.i*  If  V^f(x)  is  symmetric,  positive  definite  and  has 
the  same  sparsity  pattern  for  every  x€Cx^,x_^];  and  if  every  symmetric 
positive-definite  matrix  with  that  same  sparsity  pattern  has  a  Cholesky 
factor  in  L  ,  then  there  exists  some  v£R°  such  that  Problem  2  has  a 
solution.  If  veR°  and  Q(y,v)nQ(v,s)^5‘0,  then  y'^s=v'^v^0,  with  equal¬ 
ity  only  for  y=0. 

Proof:  The  proof  is  accomplished  by  drawing  slightly  different  conclu¬ 
sions  from  standard  arguments.  We  write 


y  =  [\V^ f (x^ + ts)dt]s 


Bs , 


where  B  is  a  symmetric  positive-definite  matrix  with  the  sparsity  of  the 

T  T 

Hessian.  Thus  by  hypothesis,  for  some  LeL  ,  B  =  LL  .  Set  v  =  L  s,  and 
note  that  L£Q(y,v)  with  v=0  only  if  y=0  and 

y’^s  =  (Ll'^s)^s  =  (l’^s)^(L^s)  =  v\  k  0. 

Thus  L  is  a  feasible  point  for  the  Problem  2  corresponding  to  this  v, 
and  by  standard  arguments,  feasibility  is  enough  to  ensure  a  solution. 
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If  the  reader  is  unfamiliar  with  such  existence  results,  then  we  give 
the  explicit  solution  in  the  next  theorem. 

T  T 

Now  to  see  the  necessity  of  the  condition  that  v  v  =  y  s^O  for 

T 

feasibility,  let  JeQ(y,v)nQ(v,s)  .  Then, 


If  V  V  =  0 , 


0 

then  V  =  0, 


2  V  V 

so  y  = 


=  =  y^s. 

-T 

Jv=0=J  v=s. 


Short  of  the  impractical  construction  of  the  proof,  we  don’t  know 
how  to  find  a  v  for  which  Problem  2  is  feasible.  Our  next  theorem,  a 
corollary  of  results  in  Dennis  and  Schnabel  (1979),  gives  a. formula  for 
L_^6L  which  solves  Problem  2  whenever  it  has  a  solution  and  when  it 
doesn’t  have  a  real  solution,  the  formula  gives  a  generalized  solution. 


Thporpm  2.2.  Let  be  the  matrix  whose  jth  column  is  given  by 


V  V-  1  m  -  .  (v.).v.s  .  - 

Pe.=[ - ^]e.-  >  (sV)''C - 

J  V  V  ^  i=l  V  V 

and  let  w  be  any  solution  to  the  least  squares  problem 


(2.1) 


min  I  iPw  -  (y  -  P^(L^)v)  I  I2* 


weR 


where 


,  V  °  /  T  i.+  T,  ^  T  ,,  i  T 

P,(L  )=L  +  >  (s  s  )  e.(v-L  s)s  e.. 

A  c  c  .  1  1C  1 

1=1 

Then  in  the  Frobenius  norm, 

°  •;  T-i+T  T 

L_^  =  +  >  {-^w^+(s  s  )  Ce^(v-L^s) 

i=l  V  V 


— ^s^w^]s^}eT  ,  (2.2) 

V  V 
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T 

is  an  element  of  L  closest  to  Q(v,s)  .  Among  these  closest  points*  it 
is  also  a  closest  point  to  Q(y*v).  Among  all  such  points,  it  is  the 
unique  closest  point  to  L^.  Thus,  if  Problem  2  is  feasible,  then  L_^  is 
its  unique  solution. 

Proof :  It  will  be  convenient  to  establish  some  further  notation.  We 

will  always  use  cl n s p s t  or  neares t  to  mean  in  the  Frobenius  norm.  Let 

A =  M(L^,Q(v,s))^ e {LeL:L^  is  a  nearest  point  to  Q(v,s)  }. 


It  is  shown  in  Dennis  and  Schnabel  (197  9)  that  for  Aj^,A2  affine, 

M(A^,A2)  is  affine  and  it  is  A^^nA^,  if  the  intersection  is  nonempty. 

T  T 

Thus,  A  is  affine  and  its  parallel  subspace  is  S  = [L  nQ(0,s)3  . 


From  Theorem  4.5  in  Dennis  and  Schnabel  (1979), 


T  T  n  T  4  J.  T  rru  1  T  T 

V  V  L  V  V  1=1  L  V  V 


T  °  i  T 

where  P  j(^^)  =  >  ^ 

L"^  V  V  i=l  V  V 


T  nv.  n 

-  >  (AV  e^[  >  -f 

®  v^v  i=l  v^v  ^  i=l  ^  k=iv^v 


T 

i 


n  V. 


'i  i  T  “  ,  T  i.+  '^i  ,  T  iv  i  T 
=  >  -m—  we.  -  >(ss)  — ( s  w  )  s  e  . . 

i=l  V^V  "  i=l  vL 


Thus , 


,  ^  °  ,  T  iv+  T,  ^T  V  i  T 

P  (L  )=L  +  >  (s  s  )  e  (v-L  s)s  e., 

A  c  c  i.”l  i  »  ^ 


Also  from  Theorem  3.2  in  Dennis  and  Schnabel  (1979),  we  know  that 


T 

=  P.(L  )  +  P„(^), 

+  Ac  b  1 
V  V 

where  w  is  any  solution  to  the  least  squares  problem 


(2.3) 
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min  I  |Pw  -  (y  -  P^(L^)v)  I  l2» 


W£R 


.  th 


with  the  j —  column  of  P  given  by 


J  V  V 


for  j  -  l»2*...»ni 


(2. A) 


It  is  straightforward  to  show  this  to  be  (2.1) »  but  we  will  give 
the  proof  since  some  facts  about  projectors  are  used.  First  note  that 
once  we  have  done  so»  the  proof  of  the  theorem  is  complete  because  (2.2) 
can  be  obtained  by  substituting  the  expressions  for  the  projections  into 
(2.3). 


ve 


Now  P  ) 

L  V  V 


T 

v.e  . 

V  V 


Hence, 


e  .V 


T 
ve  . 


Pe(-^)  =  CP  x(P  l(-T^))]' 
v  V  S  L  V  V 


=  (s^s^)V(-P  ^(^)s)e.(s.)V 

L  V  V  i=l  I*  V  V 

T  T  T 

v.e  .  n  ■  .  .  e.v.  e  .s  .  .  „  _ 

=  [ —  >  (s  s  )  [ - (s  )  J^. 

V  V  i=l  V  V 


Consequently  by  (2. A),  the  j  ^  column  of  P  is 


T 

V  V.  n  „  .  (v. ) .V . s  . 

1-  .Tl^+.-  iiii-iX 

Pe  .  =  [ - e  .  -  >  (s  s  )  [ - ■* — :j: - ^3  s 

J  ..  .T  3 


V  V 


i=l 


T 

V  V 


T 

V  V.  j  -r  i  J.  Cv.  )  .V  .S  .  . 

=  [ - =1]  e  -  >  (s^s^)'"C - J  6^, 

V  V  ^  i=l  V  V 


because  v.  is  zero  in  every  coordinate  past  the  j.^.  This  completes 


the  proof. 
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If  Problem  2  is  not  feasible*  then  Theorem  2»2  gave  a  solution  to 
the  generalized  problem 

T 

L  =argminl|L-L  ||  s.t.  L  £  K(M(L»Q(v»s)  )»Q(y*v)). 

+  c  r 

VJe  could  have  chosen  to  solve  another  generalization  of  Problem  2: 

T 

=  argmin  I  IL  -  1  Ip  s.t.  L  e  M(M(L,Q(y»v) )  *Q(v*s)  ). 

The  solution  to  this  problem  can  be  obtained  using  the  same  principles 
as  in  the  proof  of  Theorem  2.2. 

T 

Hence,  if  ve  have  some  reasonable  choice  rule  for  v»  with  L_^ 

given  by  (2.2)  would  give  a  sparse  symmetric  positive-definite  update  of 
T 

B  =L  L  . 
c  c  c 

2,.  Algorithm 

In  this  section,  we  will  outline  our  test  algorithm  for  the  minimi¬ 
zation  problem  using  our  updating  procedure. 


Given  Xq,  Lq.  For  k  =  0*1,2,... 

(1)  Compute  f  (x^)  ,V  f  (Xj^)  and  test  for  convergence. 


(2) 

Calculate 

Pk  \^kPk  =  ’^^K^- 

(3) 

Calculate 

satisfying 

(4) 

Vi  ^ 

^  ""’^k+l  "^^k* 

(5) 

Use  some 

choice  rule  to  pick  a  v.  (More  on  this  later.) 

(6)  Form  the  right-hand  side  of  the  least  squares  problem: 
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^  T  i  +  T  T  i 
y  -  P^(Lj^)v  =  y  -  L^v  -  _>  (s  s  )  e^(v-L^s)v^s  . 

(7)  Accumulate  P  column  by  column  using  (2.1). 

(8)  Solve  the  least  squares  problem  for  w: 


Pw  =  y  - 

(9)  Update  to  get  using  (2.2). 

Some  remarks  should  be  made  about  steps  (5)  and  (8). 

Remark  X.  For  a  given  choice  of  v  the  update  L_j_  only  solves  Problem  2 
provided  that  A  n  Q(ytv)  is  nonempty*  i.e..  Problem  2  is  feasible. 
Moreover*  since  we  trust  the  l.c.s.u.  idea*  we  certainly  hope  that  the 
vector  V  that  we  pick  will  give  us  an  which  is  not  too  far  from  . 
For  example*  if  we  ignore  sparsity  in  the  lower  triangular  factors*  then 
we  might  want  L_j_  to  satisfy 


*^BFGS  “  ^c*  *F' 


(3.1) 


where  L__-„  is  the  lower  triangular  matrix  obtained  from  the  BFGS  Chole- 
BFGS 

sky  update  procedure.  There  is  no  obvious  way  to  choose  v  which  satis¬ 


fies  both  these  conditions.  Since  v- 


.iS 


„  L  s  gives  the  BFGS  update 

|L>| 

c 


when  used  with  the  BFGS  procedure*  it  would  seem  reasonable  to  test  our 
updating  procedure  using  this  v.  However*  we  do  take  sparsity  into 
account*  at  least  to  require  L_|_  to  be  lower  triangular*  so  this  choice 
for  V  cannot  guarantee  that  AnQ(y*v)  be  nonempty*  and  in  fact*  it  may 
not  give  an  that  satisfies  (3.1). 

If  L  is  the  subspace  of  lower  triangular  matrices  without  any 
additional  sparsity,  then  the  other  choice  of  v  for  which  both 
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T 

AnQ(y,v)  9*  and  (3.1)  are  satisfied  is  v=LgpggS.  The  truth  of  this 
statement  is  obvious  since  itself  belongs  to  AnQ(y»v).  VJe  want 

to  point  out  that  this  choice  for  v  is  not  a  practical  one  since  it 
requires  that  we  know  i'gpQg*  However*  since  we  are  trying  to  test  the 
usefulness  of  applying  l.c.s.u.  to  the  factor*  this  v  is  interesting  for 
our  purposes. 

Rfimark  2.  Tbe  solution  of  the  least  squares  problem  in  step  8  is  impor¬ 
tant  in  our  procedure.  In  the  testing*  we  used  the  SVD  to  solve  this 

problem  and  found  that  the  matrix  P  is  always  ill-conditioned  with  con 

9  18 

dition  number  ranging  between  10  and  10  .  In  all  cases  tested,  P 

usually  had  n-1  ”nice"  singular  values  and  one  relatively  bad  one.  We 

used  the  following  criteria  to  determine  the  numerical  rank  of  P.  Let 
t  h 

<7^  denote  the  i—  singular  value  of  P*  then 

(T.  =  0  if  <7.  <  Nlmacheps  |  |P  1  I 

where  macheps  ~  10  is  the  machine  epsilon  of  the  arithmetic  used. 
Though  this  seems  reasonable*  we  have  observed  that  in  some  cases  our 
minimization  algorithm  performs  better  without  it.  This  probably  indi 
cates  tnat  the  size  of  the  residual  is  more  important  than  the  well- 
posedness  of  the  least-squares  problem.  When  some  is  set  to  0*  (3.1) 

is  sometimes  not  satisfied  even  though  we  know  that  theoretically  for 
T 

v  =  L___-s  this  could  never  happen, 

BF  G  o 

it.  Discussion  Computational  Resvlts, 

The  following  updates  were  tested  in  the  algorithm  of  Section  3. 

(A)  given  by  the  EFGS  Cholesky  update* 

T 

(B)  L_^  given  by  (2.2)  with 
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(C)  L  given  by  (2.2)  with  v  = 


\ 


T 

T  ,  c 


IL  s 
c 


In  general,  (C)  gave  very  poor  performance  for  the  reasons  that  we 
have  already  discussed;  so  we  will  shift  our  attention  to  (A)  and  (B). 
The  test  problems  used  were  the  18  problems  documented  in  More  et  al 
(1981).  Moreover,  to  test  for  robustness  of  the  algorithms,  we  followed 
the  idea  of  More*  et  al  in  starting  at  Xq=x^,  lOx^,  and  lOOx^  respec¬ 
tively,  where  x  is  the  standard  start  for  the  test  problem.  This  gave 
^  s 


us  a  total  of  54  test  cases. 


All  the  runs  were  made  in  18-digit  arithmetic  without  any  rescaling 
of  the  problems  and  with  a  default  maximum  stepsize  allowed  in  the  line 
search  STEPMX  =  max{10^  ,  lO^llx^H^).  Also,  convergence  was  assumed 

when  either 


IX.  -X.  I  .5 

max  { - ^  STPTOL  =  10  . 

i  max{ |x^i  ,  1} 

or 

lVf(x^).|  max {[x^  1,1}  _.c 

max  { - - r - ^ ^  GRDTOL  =  10  . 

i  max{  I  f (x  )  1 ,  1} 

In  a  lot  of  cases,  (B)  is  certainly  competitive  with  (A).  However 
the  following  observations  were  made  in  the  cases  where  (B)  performs 
poorly : 

(1)  Drastic  changes  in  the  Newton  step  occur  frequently. 

(2)  (B)  is  not  robust  in  the  sense  that  it  performs  very  poorly  far 
away  from  the  solution,  e.g.  when  Xq  = lOOx^. 
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(3)  The  steps  generated  by  (B)  usually  do  not  lead  to  a  decrease  m 

function  value  as  large  as  does  the  BFGS  update.  Even  if  we  add  a 
p-condition  to  our  line  search,  i.e.,  ^  ^ 

behavior  persists  no  matter  how  accurate  the  line  search  is  chosen 
to  be,  e.g.  for  p=0.01. 

(4)  If  we  switch  from  (B)  to  (A)  using  whatever  information  the  algo¬ 
rithm  has  accumulated  up  to  that  point,  then  the  BFGS  update  always 
seems  to  be  able  to  converge  to  the  solution  quite  rapidly  from 
what  is  supposed  to  be  ’’bad**  information. 

(5)  Let  DIFF  =  I  ij-  1  tL^-Lj  Ip.  We  observe  that  in  the 

case  where  (B)  is  doing  poorly,  DIFF  is  always  relatively  large; 

when  (B)  performs  better  or  as  well  as  the  BFGS  uppdate,  DIFF  is 

always  relatively  small.  Hence  (B)  is  only  good  when  L_|_  is  near 

L  ,  Does  this  say  that  doing  a  least  change  secant  update  of 
BFGS 

is  not  sensible?  We  don't  really  know  since  it  only  tells  us  that 
our  procedure  does  not  give  a  reasonable  update  for  the  choice  of 

5  , 

V  = s.  The  following  is  a  possible  explanation  of  this 
BFGS 

behavior.  Since  we  are  at  L^,  we  need  to  compute  L^^^g  to  get 

v  =  LgpggS.  Hence  in  the  case  where  L^p^g  is  far  away  from  L^,  the 

information  that  we  use  to  obtain  L_^  will  not  reflect  the  current 
information  contained  in  L^.  It  would  seem  more  reasonable  to  have 
a  choice  rule  that  used  information  at  the  current  step  to  deter¬ 
mine  a  V  that  satisfies  both  (3.1)  and  AnQ(y,v)5^^  . 

Although  this  discussion  is  based  on  extensive  computational 
results,  we  feel  that  there  is  little  point  to  including  actual  numbers 
of  function  and  gradient  evaluations  here.  Our  experiments  are 
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continuing  and  it  is  our  plan  to  publish  all  test  results  together*  if 
at  all. 


ggni:,Itf.gieii£ 

We  would  really  like  to  be  able  to  solve  Problem  1,  but  we  con¬ 
sidered  the  apparently  more  tractable  Problem  2*  and  we  have  developed 
what  should  be  a  reasonable  way  to  update  a  sparse  Cholesky  factoriza 
tion  by  updating  to  get  L_|_  directly.  Unfortunately*  the  whole  pro 
cedure  is  based  on  the  choice  of  the  vector  v.  It  is  reasonable  to  con 
jecture  that  if  our  idea  has  merit*  then  the  obstacle  is  the  determina¬ 
tion  of  V  as  mentioned  at  the  end  of  the  last  section.  This  is  somewhat 


similar  to  the  problem  of  making  a  useful  partially  separable  decomposi¬ 
tion  in  the  Griewank-Toint  approach  (Griewank  and  Toint  1983).  We  also 
would  need  to  find  a  cheap  way  to  solve  the  least  squares  problem  in 
Step  8  of  the  algorithm  to  make  our  algorithm  useful.  There  are  other 
ways  to  simplify  Problem  1  that  may  turn  out  to  be  more  useful*  for 

example*  we  might  linearize  the  constraint  L_^L_^s  =  y  in  L_^.  (See  Green- 

stadt*  1983.) 
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